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SEGMENTING AN IMAGE VIA A GRAPH 



[001] The present application claims, under 35 U.S.C. § 119, the priority 
benefit of European Patent Application No. 02079880.7 filed November 22, 2002, the 
entire contents of which are herein fully incorporated by reference. 

BACKGROUND OF THE INVENTION 

Field of the Invention 

[002] The invention relates to a method of segmenting a composite image of 
pixels into a number of fields corresponding to lay-out elements of the image, the 
pixels having a value representing the intensity and/or color of a picture element. 
The invention further relates to a device implementing the method, which device 
comprises an input unit for inputting an image, and a processing unit. 

Discussion of Background Art 

[003] Several methods for segmenting a composite image, such as a 
document including text and figures, to identify fields corresponding to layout 
elements, are known in the art, and a common approach is based on processing the 
background. The image is represented by pixels that have a value representing the 
intensity and/or color of a picture element. This value is classified as background 
(usually white) or foreground (usually black, being printed space). The white 
background space that surrounds the printed regions on a page is analyzed. 

[004] A method for page segmentation is known from the article "Image 
Segmentation by Shape-Directed Covers" by H.S. Baird et.al. in Proceedings 10 th 
International Conference on Pattern Recognition, Atlantic City, NY, June 1990, pp. 
820-825. According to this method, in an image to be analyzed, a set of maximal 
rectangles of background pixels is constructed, a maximal rectangle being a 
rectangle that cannot be enlarged without including a foreground pixel. Segmentation 
of the image into information-bearing fields, i.e. text columns, is achieved by 
covering the total image with a reduced set of the maximal rectangles. The 
remaining 'uncovered' area is considered foreground and may be used for further 
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analysis. A problem of this method is that the fields are defined as areas in the pixel 
domain, which does not allow computationally efficient further processing. 

[005] U.S. 6,470,095 discloses a method of page segmentation in which text 
areas are first preprocessed in a number of processing steps, to construct closed 
areas, called "enclosure blobs", of black pixels. In the remaining white spaces, bands 
of white space having a maximal length are constructed by suppressing bands of 
white space adjacent to a longer band. The final bands of white space, horizontal 
and vertical are then replaced by their midlines. Finally, the junctions between 
horizontal and vertical midlines are detected, and loose ends are cut off. The 
remaining midline sections are used as delimiters of text fields. This known method 
involves a large number of processing steps and may in some instances give 
inaccurate results, when white spaces connect, but their midlines do not. 

[006] Another method for page segmentation is known from the article 
"Flexible page segmentation using the background" by A. Antonacopoulos and R.T 
Ritchings in Proceedings 12 th International Conference on Pattern Recognition, 
Jerusalem, Israel, October 9-12, IEEE-CS Press, 1994, vol2, pp. 339-344. According 
to this method, the background white space is covered with tiles, i.e. non-overlapping 
areas of background pixels. 

[007] The contour of a foreground field in the image is identified by tracing 
along the white tiles that encircle it, such that the inner borders of the tiles constitute 
the border of a field for further analysis. A problem of this method is that the borders 
of the fields are represented by a complex description which frustrates an efficient 
further analysis. 

SUMMARY OF THE INVENTION 
[008] It is an object of the invention to provide a method and device for 
segmenting an image which is more efficient, and in particular delivers a simple 
description of the segmented image that can easily be used in further processing 
steps. 

[009] According to a first aspect of the invention, a method of segmenting an 
image of pixels into a number of fields, includes constructing separating elements 
corresponding to rectangular areas of adjacent pixels having a background property 
indicative of a background of the image; constructing a graph representing the lay- 
out structure of the image by defining vertices of the graph on the basis of 
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intersections of separating elements that are substantially oriented in predetermined 
separation directions, in particular horizontal and vertical direction, and defining 
edges of the graph between the nodes corresponding to the field separators; and 
defining field separators corresponding to the edges of the graph. 

[010]- According to a second aspect of the invention, a device for segmenting 
an image of pixels into a number of fields corresponding to lay-out elements of the 
image, the pixels having a value representing the intensity and/or color of a picture 
element, includes: an input unit for inputting an image; and a processing unit for 
constructing a graph representing the lay-out structure of the image by constructing 
separating elements corresponding to rectangular areas of adjacent pixels having a 
background property indicative of a background of the image, defining vertices of the 
graph based on intersections of separating elements that are substantially oriented 
in different separation directions, in particular horizontal and vertical direction, and 
defining edges of the graph between the vertices corresponding to the separating 
elements. 

[011] According to a third aspect of the invention, a computer program 
product for performing the method of the present invention is provided. 

[012] The advantage of constructing the graph is that the edges provide a 
compact and efficient representation of the borders of the fields. Further analysis of 
the fields based on the graph is computationally efficient. 

[013] The invention is also based on the following recognition. A graph 
representation has been proposed but rejected as being too complex in 
segmentation in the article by A. Antonacopoulos as described above. The inventors 
have seen that the graph of Antonacopoulos is not representing the fields at all, but 
only provides a representation of the background tiles in the image and their 
adjacency. The graph constructed according to the invention, however, directly 
covers the fields based on the structure of the background, and therefore provides a 
representation on a high level of the fields in the layout of the image. 

[014] It is noted that a graph representation is used for representing the 
layout of a document by Y. Belaid et al. f "Item searching in forms: application to 
french tax form", Document analysis and recognition, 1995, Proceedings of the third 
international conference on Montreal, Que., Canada, 14-16 Aug. 1995, Los Alamitos, 
CA, USA, IEEE Comput. Soc, US, 14 August 1995 (1995-08-14), pp. 744-747, 
XP01 0231 002, ISBN: 0-8186-7128-9. However, according to this disclosure, a graph 
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is constructed from existing black lines in the document, that frame fields that may or 
may not contain text. Thus, the use of text areas and white spaces in the document 
image is not made, and this known method would be useless in documents not 
having black frame lines. 

[015] In an embodiment, the step of defining vertices comprises constructing 
subsets of separating elements that are substantially oriented in the predetermined 
separation directions, and determining the intersections between pairs of separating 
elements from both subsets. This has the advantage that the vertices in the graph 
are constructed in an efficient way. 

[016] In a further embodiment, the method comprises constructing a set of 
maximal rectangles, a maximal rectangle being a rectangular part of the image in 
one of the separation directions, that has the maximum possible area without 
including a pixel not having the background property indicative of a background of 
the image, and constructing the separating elements in a cleaning step wherein at 
least one pair of overlapping maximal rectangles in the set is replaced by an 
informative rectangle that is a rectangular part of an area combining the areas of the 
pair, which rectangular part has the maximum possible length in the relevant 
separation direction. 

[017] This has the effect that separating elements that are long and narrow 
along a separation direction are constructed efficiently. The advantage is that 
separating elements most informative for separating fields are constructed and fields 
enclosed by the separating elements are detected easily. Although initially a large 
number of maximal rectangles are found the cleaning step efficiently reduces said 
number so that a computationally efficient procedure for construction of the 
separating elements is possible. 

[018] In an embodiment of the method, prior to constructing the maximal 
rectangles, the image is filtered by detecting foreground separator elements that are 
objects in the foreground of the image having a pattern of pixel values deviating from 
said background property, in particular black lines or dashed or dotted lines, and by 
replacing pixels of the detected foreground separators by pixels having the 
background property. The effect of replacing foreground separators by the 
background color is that larger and more relevant areas of background are formed. 
The advantage is that larger background areas are present and without additional 
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computational steps. This results in larger maximal rectangles, which improves the 
quality of the resulting segmentation. 

[019] These and other objects of the present application will become more 
readily apparent from the detailed description given hereinafter. However, it should 
be understood that the detailed description and specific examples, while indicating 
preferred embodiments of the invention, are given by way of illustration only, since 
various changes and modifications within the spirit and scope of the invention will 
become apparent to those skilled in the art from this detailed description. 

BRIEF DESCRIPTION OF THE DRAWINGS 
[020] These and other aspects of the invention will be apparent from and 

elucidated further with reference to the embodiments described by way of example in 

the following description and with reference to the accompanying drawings, in which 
[021] Figure 1 shows an overview of an exemplary segmentation method 

usable in the present invention, 

[022] Figure 2 shows a part of a sample Japanese newspaper, 

[023] Figure 3 shows the merging of objects along a single direction, 

[024] Figure 4 shows segmentation and two directional merging of objects 

according to an embodiment of the present invention, 

[025] Figure 5 shows construction of a maximal rectangle from white runs, 

[026] Figure 6 shows construction of maximal white rectangles, 

[027] Figure 7 shows cleaning of overlapping maximal white rectangles, 

[028] Figure 8 shows a graph on a newspaper page, 

[029] Figure 9 shows two types of intersection of maximal rectangles, and 

[030] Figure 10 shows a device for segmenting a picture according to an 

embodiment of the present invention. 

[031] The figures are diagrammatic and not drawn to scale. In the figures, 

elements which correspond to elements already described have the same reference 

numerals. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
[032] Figure 1 shows an overview of an exemplary segmentation method 
showing three basic steps from known segmentation systems. Referring to Figure 1, 
an input image 11 is processed in a CCA (Connected Component Analysis) module 
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14 that analyses the pixels of the image using Connected Component Analysis. First 
an original picture that may be a black-and-white, grayscale or coloured document, 
e.g. a newspaper page, is scanned, preferably in gray scale. Grayscale scanned 
pictures are halftoned for assigning a foreground value (e.g. black) or a background 
value (e.g. white) to each pixel. The CCA module 14 finds foreground elements in 
the image by detecting connected components (CC) of adjacent pixels having similar 
properties. An example of these first steps in the segmentation process are for 
instance described in U.S. Patent No. 5,856,877. 

[033] The CCA module 14 produces as output CC Objects 12 that are 
connected components of connected foreground pixels. An LA (Layout Objects) 
module 15 receives the CC Objects 12 as input and produces Layout Objects 13 by 
merging and grouping the CC Objects 12 to form larger layout objects 13 such as 
text lines and text blocks. During this phase, heuristics are used to group layout 
elements to form larger layout elements. This is a logical step in a regular bottom-up 
procedure. An AF (Article Formation) module 16 then receives the Layout Objects 13 
as input and produces Articles 17 as output by article formation. In this module 16, 
several layout objects that constitute a larger entity are grouped together. The larger 
entity is assembled using layout rules that apply to the original picture. For example 
in a newspaper page, the AF module 16 groups the text blocks and graphical 
elements like pictures to form the separate articles, according to the layout rules of 
that specific newspaper style. Knowledge of the layout type of the image, e.g. 
Western type magazine, Scientific text or Japanese article layouts, can be used for a 
rule-based approach of article formation resulting in an improved grouping of text 
blocks. 

[034] According to the invention, additional steps are added to the 
segmentation as described below. These steps relate to segmentation of the image 
into fields before detecting elements within a field, i.e. before forming layout objects 
that are constituted by smaller, separated but interrelated items. Figure 2 shows a 
sample Japanese newspaper. Such newspapers have a specific layout that includes 
text lines in both the horizontal reading direction 22 and the vertical reading direction 
21. The problem for a traditional bottom-up grouping process of detected connected 
components is that it is not known in which direction the grouping should proceed. 
Hence the segmentation is augmented by an additional step of processing the 
background for detecting the fields in the page. Subsequently the reading direction 
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for each field of the Japanese paper is detected before the grouping of characters is 
performed. 

[035] In an embodiment of the present method, separator elements, e.g. 
black lines 23 for separating columns are detected and converted into background 
elements. With this option, it is possible to separate large elements of black lines 23 
containing vertical and horizontal lines that are actually connected into different 
separator elements. In Japanese newspapers, lines are very important objects for 
separating fields in the layout. It is required that these objects are recognized as 
lines along separation directions. Without this option, these objects would be 
classified as graphics. Using the option of the present invention, the lines can be 
treated advantageously as separator elements in the different orientations separately 
for each separation direction. 

[036] Figure 3 shows a basic method of merging objects in a single direction. 
This figure depicts the basic function of the LA module 15 for finding the layout 
objects oriented in a known direction, such as text blocks for the situation that the 
reading order is known. Referring to Figure 3, connected components (CC objects) 
12 are processed in a first, analysis step 31 by statistical analysis resulting in 
computed thresholds 32. In a second, classification step 33 the CC-classification is 
corrected resulting in the corrected connected components 34, which are processed 
in a third, merging step 35 to join characters to text lines, resulting in text lines and 
other objects 36. In a fourth, text merging step 37 the text lines are joined to text 
blocks 38 (and possibly other graphical objects). According to the requirements for 
Japanese news papers, the traditional joining of objects must be along at least two 
reading directions, and the basic method described above must be improved 
therefor. 

[037] Figure 4 shows segmentation and two directional joining of objects 
according to an embodiment of the present invention. In this embodiment, new 
additional steps have been added compared to the single directional processing in 
Figure 3. Referring to Figure 4, in a first (pre-) processing step, a graph 41 of the 
image is constructed. The construction of the graph by finding field separators is 
described below. In the graph, fields are detected in a field detection step 42 by 
finding areas that are enclosed by edges of the graph. The relevant areas are 
classified as fields containing text blocks 47. In the text block 47 (using the 
connected components 43 or corrected connected components 34 that are in the 
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text block area), the reading order 45 is determined in a step 44. The reading 
direction detection is based upon the document spectrum. Using the fields of the 
text blocks 47, the contained connected components 43 and the reading order 45 as 
input, the Line Build step 46 joins the characters to lines as required along the 
direction found. 

[038] Now the constructing of the graph 41 is described. A graph- 
representation of a document is created using the background of a scan. Pixels in 
the scan are classified as background (usually white) or foreground (usually black). 
Because only large areas of white provide information on fields, small noise objects 
are removed, e.g. by down-sampling the image. The down-sampled image may 
further be de-speckled to remove single foreground (black) pixels. 

[039] The next task is to extract the important white areas. The first step of 
this task is to detect so-called white runs, one pixel high areas of adjacent 
background pixels. White runs that are shorter than a predetermined minimal length 
are excluded from the processing. 

[040] Figure 5 shows, as an example, four horizontal runs 51 of white pixels, 
that are adjacent to each other in the vertical direction. As shown in Figure 5, 
foreground area 53 is assumed to have foreground pixels directly surrounding the 
white runs 51. A "maximal white rectangle" is defined as the largest rectangular area 
that can be constructed from the adjacent white runs 51, hence a rectangular white 
area that can not be extended without including black (foreground) pixels. A maximal 
white rectangle 52 is shown based on the four white runs 51 having a length as 
indicated by the vertical dotted lines and a width of 4 pixels. When a white rectangle 
can not be extended, it has a so-called maximal separating power. Such a rectangle 
is not a smaller part of a more significant white area. Hence, in this example, the 
rectangle 52 is the only possible maximal rectangle of width 4. Further rectangles 
can be constructed of width 3 or 2. A further example is shown in Figure 6. 

[041] The construction of white rectangles is done separately in different 
separation directions, e.g. horizontal and vertical white rectangles. Vertical white 
rectangles are detected by rotating the image, and detecting horizontal white runs for 
the rotated image. It is noted that depending on the type of image or application also, 
other separation directions may be selected such as diagonal. 

[042] An algorithm for constructing maximal white rectangles according to 
the present invention is as follows. The input of the algorithm includes all horizontal 
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one pixel high white runs (WR) detected from a given image. Each white run is 
represented as a rectangle characterized by a set of coordinates ((x 1l y 1 ),(x 2 ,y2)), 
where Xi and y<i are coordinates of the top left corner of the rectangle and x 2 and y 2 
are the coordinates of the bottom right corner of the rectangle. Each white run 
present in the active ordered object INPUT LIST is tested on an extension possibility. 
The extension possibility is formulated in the condition whether a given WR, labeled 
by p, can produce a maximal white rectangle (MWR) or not. If the extension 
possibility is FALSE, then p is already a maximal one, and p is deleted from the 
active INPUT LIST and written to the active RESULT LIST. If the extension 
possibility is TRUE, then the test for extension is repeated until all MWRs initiated by 
p have been constructed. Then p is deleted from the INPUT LIST and all MWRs 
obtained from p are written to the active RESULT LIST. When all white rectangles 
from the INPUT LIST have been processed, the RESULT LIST will contain all 
MWRs. To increase the efficiency of this algorithm, a sort on the y value is applied to 
the INPUT LIST. First, the algorithm is applied for horizontal WRs, i.e. for white runs 
with width larger than height. And after a 90° turn of the image it can be applied to 
vertical WRs. 

[043] In an embodiment, the algorithm for constructing the maximal 
rectangles is as follows. The rectangle data are stored as a linked list, with at least 
the coordinates of the rectangle vertices contained in it. The INPUT and RESULT 
LISTs are stored as a linked list too, with at least three elements, such as the 
number of white rectangles, and pointers on the first and the last element in the 
linked list. The following steps are executed: Activate INPUT LIST; Initiate RESULT 
LIST; and Initiate BUFFER for temporary coordinates of the selected rectangle. Start 
from the first white rectangle labeled by p 1f out of the active ordered INPUT LIST. 
The next white rectangle on the list is labeled by p 2 . For each white rectangle on the 
INPUT LIST, examine whether or not if p<i has extension possibility. For the active 
white rectangle p 1f find the first one labeled by p n j , j=1,...,l, on the active ordered 
INPUT LIST, which satisfies 

y 2 (Pi)=yi(Pnj), 

x^pnj)^ x 2 (pi), and 

X^Pnj^X^Pi). 
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[044] This search results in the set {p n i,Pn2...-.Pni}. Only if the set 
{Pni,Pn2,---,Pni} is not empty, pi is said to have extension possibility. 

[045] If pi does not have an extension possibility, then pi is a maximal white 
rectangle. Write p^ to the RESULT LIST, and remove p1 from the INPUT LIST, and 
proceed with p 2 . If Pi is extendible, then apply the extension procedure to p<|. 
Proceed with p 2 . We note here, that pi can have an extension possibility while being 
maximal itself. 

[046] The Extension Procedure is as follows. Suppose pi has an extension 
possibility, then there is the set {p n i,Pn2,...,Pni}. The extension procedure is applied to 
each element of {p n i,Pn2,...,Pni} consistently. For the white rectangle p-j which is 
extendible with rectangle p njl j = 1,...,l, construct a new rectangle pi, n j with 
coordinates: 

Xi(Pi,nj) = max { Xi(pi), x^pnj)}, 

x 2 (Pi,nj) = min { x 2 (pi), x 2 (p n j) }, 

yi(Pi.nj) = yi(Pi), and 

y2(Pl,nj) = y 2 (Pnj). 

[047] Write the coordinates of p 1>n j. j=1.— J to the "coordinates" buffer. 
Repeat the test on extension possibility now for p 1>nj . If the test is TRUE, pi, n j is 
maximal. Write pi >n j to the RESULT LIST, otherwise, extend pi, n j. 

[048] Before applying the extension procedure to pi >n j, we check p<[ and p n j 
for absorption effect. The test of p^ and p n j for absorption effect with p1>n j is as 
follows. By absorption effect we mean the situation in which pi ( p n j) or both is (are) 
completely contained in pi, n j. In coordinates this means: 

Xi(p 1t nj) < x^pO, 

x 2 (p 1>nj ) > x 2 (p k ), where k=1,n } , j=1,...,l. 

[049] If the condition is TRUE for p 1f . then p<i is absorbed by p 1>n j. Remove pi 
from the INPUT LIST. If the condition is TRUE for p n j, then p n j is absorbed by p 1in j. 
Remove p n j from the INPUT LIST. 

[050] The algorithm assumes that the rectangle is wider than it is high, and 
thus the rectangles are primarily horizontal. To construct MWRs in the vertical 
direction, the original binary image is rotated by 90° clockwise. The algorithm 
mentioned above is repeated for the rotated image. As a result, all vertical MWRs for 
the original image are constructed. 
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[051] Figure 6 shows a construction of maximal white rectangles according 
to an embodiment of the present invention. The pixel coordinates are displayed 
along a horizontal x axis and a vertical y axis. Four white runs 61 are shown left in 
Figure 6. The white runs (WR) are described as rectangles with the coordinates of 
their upper and bottom corners correspondingly: 

WRiiKIO.IMSO^)), 

WR 2 :((10,2),(50,3)), 

WR 3 : ((5,3),(30,4)), and 

WR4 : ((40,3),(60,4)). 

[052] All maximal white rectangles from these white runs are constructed. 
The resulting five maximal white rectangles (MWR) are shown in the right part of 
Figure 6 as indicated by 62, 63, 64, 65 and 66. The five MWR shown are the 
complete set of MWR for the WR given in the left part of Figure 6. A construction 
algorithm for constructing the maximal white rectangles according to the present 
invention is as follows. 

[053] As an example, let the INPUT LIST contain the four white runs 61 . The 
first element from the INPUT LIST is WR^IO.IMSO^)). Label WR 1 as p 1a Examine 
Pi on the extension possibility as described above. The first candidate for extension 
is WR 2 ((1 0,2), (50,3)). Label WR 2 as p n i. Extend p-i with p n i according to the formula 
for extension above, which gives a new rectangle pi, n i with the coordinates 
((10,1), (50,3)). Test and p n i on the absorption effect with pi, n i. As follows from 
absorption test, both p-i and p n i are absorbed by p 1tn1 . Therefore, delete pi and p n i 
from the INPUT LIST. Proceed with p 1>n1 . Test pi >n i on the extension possibility, 
which gives the first candidate WR 3 ((5, 3), (30,4)). Label WR 3 as p n . Extend p 1>n i with 
Pti according to the extension formula. As a result, we obtain a new rectangle P(i, n i),ti 
with the coordinates ((10,1), (30,4)). Test p 1( m with p n on the absorption effect with 
Pd.ni),ti- The test fails. 

[054] Repeat the test on extension possibility for P(i, n i),ti The test fails, i.e. 
P(i t ni),ti has no extension possibility. It means that P(i, n i),ti is maximal. Write P(i, n i),ti 
with the coordinates ((10,1), (30,4)) to the RESULT LIST. 

[055] Proceed again with p 1n i and test it on extension possibility. The second 
candidate WR 4 ((40,3),(60,4)) is found. Label WR4 as p t2 . Extend p 1>n i with p^ 
according to the extension formula. As a result, we obtain a new rectangle P(i, n i),t2 
with the coordinates ((40,1 ), (50,4)). 
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[056] Test pi,ni with p t2 on the absorption effect with P(i, n i),t2 ■ The test fails, 
i.e. no absorption. Repeat test on extension possibility for P(i, n i),t2 and the test fails, 
i.e. P(i,ni),t2 has no extension possibility. It means that P(i, n i),t2 is maximal. Write 
P<i,M),t2 with the coordinates ((40,1),(50,4)) to the RESULT LIST. 

[057] Test p 1>n i again on extension possibility. The test fails and pi, n i is 
maximal. Write p 1>n i with the coordinates ((10,1),(50,3)) to the RESULT LIST. 

[058] Return to the INPUT LIST. The INPUT LIST on this stage contains two 
write runs, i.e. WR 3 : ((5,3),(30,4)), WR 4 : ((40,3),(60,4)). Start from WR 3 , and label 
it as p2. Repeat test on extension possibility for p 2 . The test fails, 56 p 2 is maximal. 
Write p 2 with the coordinates ((5,3), (30,4)) to the RESULT LIST. Remove p 2 from the 
INPUT LIST. 

[059] Proceed with WR4 and label it as p 3 . Test on extension possibility for p 3 
gives us that p 3 is maximal. Write p 3 with the coordinates ((40, 3), (60,4)) to the 
RESULT LIST. Remove p 3 from the INPUT LIST. Finally, the RESULT LIST 
contains five maximal white rectangles, i.e. M\NR<\ : ((10,1), (50,3)) indicated in Figure 
6 as 64, MWR 2 : ((10,1),(30,4)) indicated as 62, MWR 3 : ((40,1 ),(50,4)) indicated as 
63, and MWR 4 : ((5,3),(30,4)) as 65, MWR 5 : ((40,3), (60,4)) as 66. 

[060] Figure 7 shows a next step in the method according to the invention, 
namely a cleaning step of overlapping maximal white rectangles. In the cleaning 
step, plural overlapping maximal white rectangles are consolidated into a single so- 
called "Informative Maximal Rectangle" (IWR) that combines the most relevant 
properties of the original maximal white rectangles, as discussed below in detail. 

[061] The cleaning step may further include steps like checking on size and 
spatial relation. The upper part of Figure 7 shows, as an example, two maximal white 
rectangles MWR1 and MWR2. The pair is consolidated into a single Informative 
White Rectangle IWR in the cleaning step as shown in the lower part of Figure 7. 
The process of detecting an overlap and consolidating is repeated until no relevant 
pairs can be formed anymore. A criterion for forming pairs may be the size of the 
overlap area. 

[062] Further, the cleaning step may include removing thin or short 
rectangles or rectangles that have an aspect ratio below a certain predefined value. 
The criteria for removing are based on the type of image, e.g. a width below a 
predefined number of pixels indicates a separator of text lines and is not relevant for 
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separating fields, and a length below a certain value is not relevant in view of the 
expected sizes of the fields. 

[063] An algorithm for the cleaning step according to the present invention is 
as follows. The start of the cleaning procedure is the whole set of MWRs constructed 
as described above with reference to Figures 5 and 6. The cleaning procedure is 
applied to discard non-informative MWRs. For this reason a measure of non- 
informativeness is defined. For example, a long MWR is more informative than a 
short one. A low aspect ratio indicates a more or less square rectangle that is less 
informative. Further, extremely thin rectangles, which for instance separate two text 
lines, must be excluded. First, all MWRs are classified as being horizontal, vertical or 
square by computing the ratio between their heights and widths. Square MWRs are 
deleted because of their non-informativeness. For the remaining horizontal and 
vertical MWRs, the cleaning technique is applied which includes three steps: 

- Each MWR with a length or width below a given value is deleted. 

- Each MWR with aspect ratio (AR), defined as the ratio of the longer 
side length divided by the shorter side length, below a given value is 
deleted. 

For each pair of overlapping horizontal (or vertical) MWRi 
(( x i>yi)>( x 2,y2)) and horizontal (or vertical) MWR 2 ((ai,bi),(a 2 ,b 2 )), an 
informative white rectangle IWR is constructed with the following 
coordinates: 

(a) Horizontal overlap: 

x<i = min { Xi, ai}, 
y<i = max { y-,, b^, 
x 2 = max { x 2 , a 2 }, 
y 2 = min { y 2 , b 2 }. 

(b) Vertical overlap: 

x'i = max{xi,ai}; 
y , i = min{yi, bi}, 
x' 2 = min {x 2 , a 2 }, 
y' 2 = max { y 2 , b 2 }. 

[064] This process is repeated for all pairs of overlapping MWRs. The set of 
MWRs now comprises Informative White Rectangles IWRs. These IWRs form the 
starting point for an algorithm for segmentation of the image into fields corresponding 
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to the lay-out elements. The IWRs are potential field separators and are therefore 
called "separating elements". Using the IWRs, the algorithm constructs a graph for 
further processing into a geographical description of the image. 

[065] Figure 8 shows such a graph on a newspaper page. The picture in 
Figure 8 shows a down-sampled digital image 80 of a newspaper page. The original 
text is visible in black in a down-sampled version corresponding to Figure 2. The 
informative rectangles IWR constituting separating elements are shown in gray. For 
the construction of the graph, intersections of separating elements constituted by 
horizontal and vertical white IWRs are determined. The intersection point of two 
IWRs is indicated by a small black square representing a vertex or vertex 81 in the 
graph. Edges 82 that represent lines that separate the fields in the page are 
constructed by connecting pairs of vertices 81 via "field separators". The edges 82 of 
the graph are shown in white. The distance between the two vertices of an edge, i.e. 
the length, is assigned as weight to the edge for further processing. In an alternative 
embodiment a different parameter is used for assigning the weight, e.g. the colour of 
the pixels. An algorithm for constructing the graph is as follows. 

[066] At the beginning, the following notation and definitions for IWRs is 
given. Let R = {ri,...,r m } be the non-empty and finite set of all IWRs obtained from a 
given image I, where each IWR is specified by its x- and y- coordinates of top left 
corner and bottom right corner ( (xi <T) , yi (x) ), (x 2 (x) , y2 (T) ) ), x = 1,2,..., m respectively. 
Each rectangle r x is classified as horizontal, vertical or square based on the ratio of 
its height and width. H = { h 1 ,...,hi}, V = { vi,...,v k } , and S = {si,...,Sd} denote the 
subsets of horizontal, vertical and square IWRs, respectively, such that 

H uVuS = R and m = I + k + d, and : 

HnV = 0, VnS=0, HnS = 0 
where it is assumed that 

H * 0 , V * 0. 

[067] Further the contents of S are ignored and only the subsets H and V are 
used. This is based on the consideration that in most cases, white spaces that form 
the border of text or non-text blocks are oblong vertical or horizontal areas. Let h be 
part of H with coordinates ((xi,y 1 ),(x 2 ,y2)) and v in V with coordinates ((ai,bi) J (a2,b 2 )). 
Then h and v have overlap if 
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xi < a 2 



< 



yi ^ b 2 



x 2 >ai 



^ y 2 ^ 



[068] By the intersection point of h and v in case of overlap, we take the 
unique point P defined by the coordinates: 



I y P = 1 / 2 ( max { yi , ^ } + min { y 2 , b 2 } ) 

[069] For IWRs, only two from all possible types of overlap occur, namely 
overlap resulting in a rectangle and overlap resulting in a point. Line overlap cannot 
occur, because this would be in contradiction with the concept of the MWRs. 

[070] Figure 9 shows two types of intersection of maximal rectangles. For 
constructing the graph, the intersection points of vertical and horizontal informative 
maximal rectangles are determined to find the position of vertices of the graph, i.e. to 
determine the exact coordinates of the vertices. The left part of Figure 9 shows a first 
type of intersection of vertical IWR v and a horizontal IWR h, which results in a 
rectangular area 88 with a center of intersection point P. The right part of Figure 9 
shows a second type of intersection of a vertical IWR v and a horizontal IWR h, that 
results in a single intersection point 89 with a center of intersection at P\ 

[071] An algorithm for constructing the graph based on the intersection 
points is as follows. 

[072] P = {pi,...,Pn} denotes the set of all intersection points of vertical IWRs 
and horizontal IWRs where each p in P is specified by its x- and y- coordinates (x p , 
y p ), where p=1,...,N. Let the set P be found, and G=(X,A) an undirected graph having 
correspondence to P. The graph G=(X,A) includes a finite number of vertices X 
which are directly related to the intersection points and a finite number of edges A 
which describe the relation between intersection points. Mathematically this is 
expressed as 




= 1 /4 ( max { Xi , ai } + min { x 2 , a 2 } ), 



G(P) = ( X(P), A (P x P)), 
P: HxV->{x P , y P }, 
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where 

X = {1, .... , N}and 

A = ({1 N}x{1 ,N})with 

A ( i> j ) = J °°. if i and j are not 4-chain connected, 
L d jj, if i and j are 4-chain connected 

where dy indicates the Euclidean distance between points i and j, and where 4-chain 
connected means that the vertices of a rectangular block are connected in four 
possible directions of movement. In the above, two points i and j are 4-chain 
connected if they can be reached by walking around with the aid of 4-connected 
chain codes with min dy in one direction. 

[073] The graph as constructed may now be further processed for classifying 
the areas within the graph as text blocks or a similar classification depending on the 
type of picture. In an embodiment, the graph is augmented by including foreground 
separators, e.g. black lines or patterned lines such as dashed/dotted lines, in the 
analysis. Also, edges of photos or graphic objects which are detected can be 
included in the analysis. 

[074] The present segmenting method may also include a step of removing 
foreground separators. In this step, first, foreground separators are recognized and 
reconstructed as single objects. The components that constitute a patterned line are 
connected by analyzing element heuristics, spatial relation heuristics and line 
heuristics, i.e. building a combined element in a direction and detecting if it classifies 
as a line. A further method for reconstructing a solid line from a patterned line is 
down-sampling and/or using the Run Length Smoothing Algorithm (RLSA) as 
described by K.Y. Wong, R.G. Casey, F.M. Wahl in "Document analysis system", 
IBM J. Res. Dev 26 (1982) 647-656. After detecting the foreground separators, they 
are replaced by background pixels. The effect is that larger maximal white rectangles 
can be constructed, or supporting any other suitable method using the background 
pixel property for finding background separators. 

[075] Figure 10 shows a device for segmenting a picture according to an 
embodiment of the present invention. The various methods of the present invention 
are implementable using the device of Figure 10 or other suitable devices. Referring 
to Figure 10, the device has an input unit .91 for entering a digital image. The input 
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unit 91 may comprise a scanning unit for scanning an image from physical 
documents such as an electro-optical scanner, and/or a digital communication unit 
for receiving the image from a network like internet, and/or a playback unit for 
retrieving digital information from a record carrier like an optical disc drive. The input 
unit 91 is coupled to a processing unit 94, which cooperates with a memory unit 92. 
The processing unit 94 may comprise a general purpose computer central 
processing unit (CPU) and supporting circuits and operates using software for 
performing the segmentation as described above. The processing unit 94 may 
include a user interface 95 provided with control means such as a keyboard, a 
mouse device or operator buttons. The output of the processing unit 94 is coupled to 
a display unit 93. The display unit 93 may comprise a display screen, a printing unit 
for outputting a processed image on paper or other medium, and/or a recording unit 
for storing the segmented image on a record carrier like a magnetic tape or optical 
disk. 

[076] The processing steps of the present invention are implementable using 
existing computer programming language. Such computer program(s) may be 
stored in memories such as RAM, ROM, PROM, etc. associated with computers. 
Alternatively, such computer program(s) may be stored in a different storage medium 
such as a magnetic disc, optical disc, magneto-optical disc, etc. Such computer 
program(s) may also take the form of a signal propagating across the Internet, 
extranet, intranet or other network and arriving at the destination device for storage 
and implementation. The computer programs are readable using a known computer 
or computer-based device. 

[077] Although the invention has been .mainly explained by embodiments 
using a Japanese newspaper page as the digital image to be segmented, the 
invention is also suitable for any digital representation of any text or image having a 
layout in fields on a background, such as electrical circuits in layout images for IC 
design or streets and buildings on city maps. It is noted that in the present document, 
the use of the verb 'comprise' and its conjugations does not exclude the presence of 
other elements or steps that are not listed and the word 'a' or 'an 5 preceding an 
element does not exclude the presence of a plurality of such elements, that any 
reference signs do not limit the scope of the claims, that the invention and every unit 
or means mentioned may be implemented by suitable hardware and/or software and 
that several 'means' or 'units' may be represented by the same item. Further, the 
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scope of the invention is not limited to the embodiments, and the invention lies 
each and every novel feature or combination of features described above. 
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